-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Ray Cluster #1501
Enabling Ray Cluster #1501
Conversation
Add error handling, node watcher, and auto checkpoint_path var
from threading import Thread | ||
from metaflow.exception import MetaflowException | ||
from metaflow.unbounded_foreach import UBF_CONTROL | ||
from metaflow.plugins.parallel_decorator import ParallelDecorator, _local_multinode_control_task_step_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small change proposal: Importing internal method might lead to breaking changes in the future. Could this be done with only the ParallelDecorator
and relying on super()
in the task_decorate?
break | ||
except ImportError: | ||
print("Ray is not installed. Installing latest version of ray-air package.") | ||
subprocess.run([sys.executable, "-m", "pip", "install", "-U", "ray[air]"], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the -U necessary? This could lead to overhead/breakage in task startups even with custom built images that have Ray installed, whenever a new version is released.
This PR is encapsulated in https://github.com/outerbounds/metaflow-ray and we can close this for now. |
Overview
We require the ability to activate a Ray cluster using AWS Batch multi-node parallel jobs
This PR accomplishes the following:
@ray_parallel
decorator that a user can use to effectively spin up a Ray cluster from AWS Batch multi-node parallel job. This decorator follows the official Ray documentation on setting up Ray on-premise.What has changed
@ray_parallel
decorator modeled from base@parallel
decoratorNotes
Testing evidence